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Abstract 

In their paper on the "chasm at depth four", Agrawal and Vinay 
have shown that polynomials in m variables of degree 0(m) which 
admit arithmetic circuits of size 2 o( ™) also admit arithmetic circuits 
of depth four and size 2°( m '. This theorem shows that for problems 
such as arithmetic circuit lower bounds or black-box derandomization 
of identity testing, the case of depth four circuits is in a certain sense 
the general case. 

In this paper we show that smaller depth four circuits can be ob- 
tained if we start from polynomial size arithmetic circuits. For in- 
stance, we show that if the permanent ofnxn matrices has circuits of 
size polynomial in n, then it also has depth 4 circuits of size n°(\ / ™ i°g «) 
If the original circuit uses only integer constants of polynomial size, 
then the same is true of the resulting depth four circuit. These results 
have potential applications to lower bounds and deterministic identity 
testing, in particular for sums of products of sparse univariate polyno- 
mials. We also use our techniques to reprove two results on: 

- The existence of nontrivial boolean circuits of constant depth for 
languages in LOGCFL. 

- Reduction to polylogarithmic depth for arithmetic circuits of 
polynomial size and polynomially bounded degree. 



*UMR 5668 ENS Lyon, CNRS, UCBL, INRIA. 

^This work was done during a visit to the Fields Institute and to the University of 
Toronto's Department of Computer Science. 



1 



1 Introduction 



Agrawal and Vinay have shown that polynomials of degree d = 0{m) in 
m variables which admit nontrivial arithmetic circuits also admit nontriv- 
ial arithmetic circuits of depth four [lj. Here, "nontrivial" means of size 
2o(d+cilog -j) rpj^ rest Qtj n g depth 4 circuits are X^n^EI arithmetic formu- 
las: the output gate (at depth 4) and the gates at depth 2 are addition gates, 
and the other gates are multiplication gates. This theorem shows that for 
problems such as arithmetic circuit lower bounds or black-box derandomiza- 
tion of identity testing, the case of depth four circuits is in a certain sense 
the general case. 

But what if we start from arithmetic circuits of size smaller than 2°( m ) 
(for instance, of size polynomial in m)? It is reasonable to expect that the 
size of the corresponding depth four circuits will be reduced accordingly, but 
such a result cannot be found in PQ. One of the main results of this paper is 
a depth reduction theorem for VP families (i.e., families (f n ) of polynomials 
of degree and arithmetic circuit complexity polynomially bounded in n) . We 
show in Theorem [5] that any VP family (/ n ) has depth 4 arithmetic formulas 
of size n O (V<1^ t°g dn) ^ wri ere d n is the degree of f n . For instance, this result 
shows that if the permanent of n x n matrices has circuits of size polynomial 
in n, then it also has depth 4 formulas of size n°(v / " 1 °g ,1 ) i This is potentially 
useful for a lower bound proof: to show that the permanent does not have 
polynomial size circuits, we "only" have to show that it does not have depth 
4 formulas of size n°(v" logn ). This is still certainly far away from the known 
lower bounds for constant depth arithmetic circuits: currently we have su- 
perpolynomial lower bound for the permanent for circuits of depth 3 only, 
and only in finite fields [5J [6] . In the restricted setting of multilinear arith- 
metic circuits, superpolynomial lower bounds can be obtained for circuits of 
arbitrary constant depth |17j . We do not address the issue of multilinearity 
in this paper. Note however that the results in |161 [T7] suggest that the 
bound in Theorem [5] could be fairly close to optimal at least for multilinear 
circuits. Indeed, a polynomial / of degree 3n — 1 in 0(n 3 ) variables with 
multilinear arithmetic circuits of polynomial size is constructed in Section 4 
of |16] , By Theorem 4.3 of |16| and Theorem 5.1 of |17| . all multilinear 
depth 4 circuits for / are of size at least n^V™/ lo s( n )). This shows that the 
exponent \fd^ in Theorem [5] cannot be removed if we insist on a reduction 
to depth 4 that would preserve multilinearity. Note that for reduction to 
depth log 2 (n), preservation of multilinearity is indeed possible |16| . 

We also perform an analysis of the size of the integer constants used 
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by the depth 4 circuit simulating a given polynomial size circuit (a similar 
analysis for the construction in [1] has not been carried out yet to the au- 
thor's knowledge). Roughly speaking, we show that reduction to depth 4 
does not require the introduction of large constants. In particular, we give 
in Theorem an analogue of Theorem [S] for VP (this is a constant-free 
version of VP). This result is used in |10| . where we show that black-box 
derandomization of identity testing for sums of products of sparse univari- 
ate polynomials with sparse coefficients would imply a lower bound for the 
permanent. Finally, we give applications of our depth reduction techniques 
to boolean circuit complexity and to the construction of arithmetic circuits 
of polylogarithmic depth. 

1.1 Main Ideas and Comparison with Previous Work 

The main depth reduction result in pQ is as follows. 

Theorem 1 Let P(x\, . . . ,x m ) be a polynomial of degree d = 0(m) over a 
field F. If there exists an arithmetic circuit of size 2°^ d+dl ° s ~d^ for P then 
there exists a depth 4 arithmetic circuit of size 2°( d+dlog ~d\ 

Theorem 2.4 in [lj also provides some bounds on the fan-in of the gates in 
the resulting depth 4 circuits. 

For multilinear polynomials, their result (Corollary 2.5 in [T]) reads as 
follows: 

Corollary 1 A multilinear polynomial in m variables which has an arith- 
metic circuit of size 2°( m ) also has a depth 4 arithmetic circuit of size 2°( m ) . 

We give the (simple) proof, which is omitted from PQ. For d = m the result 
is clear since the exponent d + dlog ^ in Theorem [1] is equal to m. Consider 
now the case of a polynomial P(X±, . . . ,X m ) of degree d < m, having a 
circuit of size 2°( m \ Let Q = P + ni^i^i- Since the number of variables 
of Q is equal to its degree, we are back to the first case: Q has a depth 
four circuit of size 2°( m \ We can obtain a circuit of size 2°( m ) for P by 
subtracting the product YHiLi -^i (this requires only m additional arithmetic 
operations). Note that this corollary and its proof hold more generally for 
any (possibly not multilinear) polynomial of degree d < to. 

By specializing the multilinear polynomial to the permanent, Agrawal 
and Vinay then state in Corollary 2.6 that if every depth 4 arithmetic circuit 
for the permanent requires exponential size, the same is true for arithmetic 
circuits of unbounded depth. It is not made precise in [T] what "exponential 
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size" exactly means. In this context (arithmetic complexity of the perma- 
nent) the most standard interpretation is probably that an exponential size 
circuit for the n x n permanent is of size 2 n ( n ) (note that the number of 
variables is m = n 2 ). With this interpretation, it is not clear why Corollary 
2.6 of [T] would follow from Theorem [JJ or Corollary [TJ 

Since the permanent of a n x n matrix has degree d = n and m = n 2 
variables, we can deduce the following from Theorem [JJ If there exists an 
arithmetic circuit of size 2°( nlogn ) for the n x n permanent then there exists 
also a depth 4 arithmetic circuit of size 2°( n log n ) . This statement is not very 
useful since we already know (by Ryser's formula [18j ) that the permanent 
has depth 3 arithmetic formulas of size 0(n2 n ). Note that applying Corol- 
lary [JJ directly to the permanent would give an even worse bound (namely, 
we would obtain depth 4 formulas of size 2°( n )). As explained earlier, we 
can show that if the permanent has polynomial size circuits it must also 
have depth 4 formulas of size n°(^™ l ° sn K This result does not follow from 
Theorem [TJ On the other hand, our results are weaker than Theorem 1 if 
we start from a very large circuit. Indeed, as explained below, we can only 
show that a circuit of size t and degree d has an equivalent depth 4 circuit 
of size 

f){Vd\ gd) Thig doeg not imply Theorem [XJ 

Before describing their general depth reduction algorithm, Agrawal and 
Vinay begin with the special case of matrix powering. For this problem there 
is a very simple and elegant reduction to depth four. Then they treat the 
general case with an apparently different approach: their construction builds 
on the depth reduction algorithm of Allender, Jiao, Mahajan and Vinay [3], 
who gave a uniform version of the depth reduction result due to Valiant, 
Skyum, Berkowitz and Rackoff |23| . In this paper we show that the matrix 
powering idea is powerful enough to handle arbitrary polynomial-size arith- 
metic circuits. Arithmetic branching programs and weakly skew circuits are 
the main tools that we use to reduce the evaluation of arbitrary arithmetic 
circuits to matrix powering. These models are known to capture the complex- 
ity of a number a problems from linear algebra such as e.g. matrix powering, 
iterated matrix multiplication or computation of the determinant |19^ 112]. 

1.2 Organization of the paper 

In Section [2] we present the two main computation models that we will use: 
arithmetic circuits and arithmetic branching programs. We define some of 
the corresponding complexity classes, and give some basic properties. In 
Section O building on a construction of Malod and Portier |12| we give an 
efficient simulation of arithmetic circuits by arithmetic branching programs. 
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Compared to [12J, we take extra care to construct branching programs of 
small depth because the square root of the depth appears in the exponent of 
the size estimate for the final depth 4 circuit. In section 2] we reduce branch- 
ing programs to depth 4 circuits using the matrix powering idea from [1]. 
Then we state our main technical result in Theorem We show in particular 
that an arithmetic circuit of size t and formal degree d has a depth 4 circuit 
of size t^Kv^logd) -yyg ,-[ raw some consequences for depth reduction of VP 
families in Section [5j and for depth reduction of VP families in Section [6j 

In Section [7] we give an application of these techniques to boolean circuit 
complexity. Namely, we show that languages in LOGCFL have constant- 
depth boolean circuits of size 2 nE (and we briefly present the history of this 
result). 

Finally, we show in Section [8] that the same tools can be used to give a 
very simple (but suboptimal) proof of the fact that for circuits of polyno- 
mially bounded size and degree, reduction to polylogarithmic depth can be 
achieved while preserving polynomial size |23j . 




2 Arithmetic Circuits and Branching Programs 

We recall that an arithmetic circuit contains addition and multiplication 
gates. In addition to these arithmetic gates there are input gates, labelled 
by variables or constants from some field K. An output gate is of fan-out 
zero. We often assume that there is a single ouptut gate. In this case an 
arithmetic circuit therefore represents a polynomial with coefficients in K. 
Without loss of generality, we can and will assume that every input gate has 
fan-out at most 1 (several input gates can be labeled with the same variable 
or constant if necessary). 

We often assume that the arithmetic gates have arity 2, but in constant- 
depth circuits we naturally allow addition and multiplication gates of un- 
bounded fan-in (we often also some explicit upper bounds on the fan-in, see 
for instance Theorem [3]). In some of our intermediate constructions (e.g. 
Proposition [2]) we also work with weighted addition gates. 

Definition 1 A n-ary weighted addition gate computes a linear combination 
a\X\ + • • • + a n x n of its inputs x\, . . . ,x n . Here a% is the weight associated 
to the i-th input of the gate. The total weight of the gate is Yl?=i \ a i\- 

For instance, a subtraction gate is a binary weighted addition gate with 
weights (1,-1). We sometimes refer to binary unweighted addition gates as 
"ordinary addition gates". The size of a circuit is its total number of gates 
(including input gates). 
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Definition 2 Fix a field K. A sequence {f n ) of polynomials with coefficients 
in K belongs to VP if there exists a polynomial p(n) and a sequence (C n ) of 
arithmetic circuits such that deg(/ n ) < p{n), C n computes f n and is of size 
at most p(n). 

The size constraint implies in particular that f n depends on polynomially 
many variables. The above definition is fairly robust. For instance we obtain 
the same class with circuits using gates of fan-in 2 or of unbounded fan-in, 
weighted or unweighted addition gates. 

An arithmetic formula is a circuit where all gates are of fan-out one, 
except of course the output gate. In the constant depth setting, arithmetic 
formulas and arithmetic circuits are polynomially related (|17|. Claim 2.2). 

The complexity of several problems from linear algebra such as iterated 
matrix multiplication or computing the determinant is captured by a re- 
stricted class of arithmetic circuits called weakly skew circuits |19| [T2] . Let 
C be an arithmetic circuit where all multiplication gates are binary. A multi- 
plication gate a in C is said to be disjoint if at least one of its two subcircuits 
is disjoint from the remainder of C, except of course for the edge from the 
subcircuit to a (removing this edge would therefore disconnect C). The 
circuit is weakly skew if its multiplication gates are all disjoint. This defi- 
nition is usually given only for circuits where all addition gates are binary 
unweighted, but we will use our slightly more general definition instead (see 
Propositions [2] and [3]) . 

There is also a closely related notion of skew circuits |19[ |8j [9] : a circuit 
with binary multiplication gates is skew if for every multiplication gate at 
least one of the two incoming edges comes from an input of the circuit. Since 
we have assumed that that input gates have fan-out at most 1, every skew 
circuit is also weakly skew. 

A circuit where the only constants are from the set {0, —1, 1} is said 
to be constant-free. A constant-free circuit represents a polynomial in 
ZLYi, . . . , X n ], where X±, . . . , X n are the variables labelling the input gates. 

The constant-free model was systematically studied by Malod In 
particular, he defined a class VP of polynomial families that are "easy to 
compute" by constant-free arithmetic circuits. First we need to recall the 
notion of formal degree: 

(i) The formal degree of an input gate is equal to 1. 

(ii) The formal degree of an addition gate is the maximum of the formal 
degrees of its incoming gates, and the formal degree of a multiplication 
gate is the sum of these formal degrees. 
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Finally, the formal degree of a circuit is equal to the formal degree of its 
output gate. This is obviously an upper bound on the degree of the poly- 
nomial computed by the circuit. Note that this definition can be applied to 
circuits with weighted addition gates of arbitrary fan-in. For instance, the 
polynomial x — 2y can be computed by a circuit containing one ordinary 
addition gate, one multiplication gate and three inputs labeled by x, y and 
the constant —2. This circuit has formal degree two. The same polynomial 
can be computed by another circuit containing a binary weighted adition 
gate (of total weight 1 + | — 2| = 3) with inputs x and y. The second circuit 
has formal degree 1. 

Definition 3 A sequence (f n ) of polynomials belongs to VP if there exists 
a polynomial p(n) and a sequence (C n ) of constant-free arithmetic circuits 
( with unweighted addition gates ) such that C n computes f n and is of size and 
formal degree at most p{n) . 

The constraint on the formal degree forbids the computation of polynomials 
of high degree such as e.g. X 2 " ; it also forbids the computation of large 
constants such as 2 2 " . The class VP is therefore a strict subset of VP (over 
the field of rational numbers, or more generally any field of characteristic 0). 
As for VP we obtain the same class with gates of fan-in 2 or of unbounded 
fan-in, but of course we cannot allow addition gates with arbitrary weights. 
We can however allow subtraction gates: 

Proposition 1 Let C be a constant-free circuit of size t and formal degree 
d, where the arithmetic gates are multiplication, unweighted addition or sub- 
traction gates (all of fan-in 2). 

There is an equivalent constant-free circuit C of formal degree d+1 and 
size at most Qt + 3, where the arithmetic gates are binary multiplications or 
ordinary additions. 

Proof. We need to get rid of subtraction gates. A first idea would be to 
write each subtraction x — y as x + (—1) x y , but the cumulative effect of the 
multiplications (— 1) xy could lead to an increase in the formal degree by more 
than 1. Instead we will represent each gate a in C by a pair of gates (ai, a 2 ) 
in C . The output of a will be equal to the differences of the outputs of a\ and 
a 2 . An input x in C can be represented by the pair (x,0). To simulate the 
arithmetic operations in C we use the following rules: (at\ — a 2 ) + (ft — ft) = 
(ai + ft) - (a 2 + ft); (ai - a 2 ) - (ft - ft) = («i + ft) - (a 2 + ft); 
(qi - a 2 ) x (ft - ft) = (ai x ft + a 2 x ft) - (a 2 x ft + a\ x ft). A 
straightforward induction shows that the gates in a pair (ai,a 2 ) will have 
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same formal degree as the gate o that they represent. Finally, to complete 
the construction of C we come back to our first idea: if (01,02) is the 
pair representing the output gate of C, we write the difference oi — 02 as 
ol\ + ( — 1) x 02- This increases the formal degree by 1. Each arithmetic 
operation in C is simulated by at most 6 operations in C , and we need 3 
additional gates to perform the final subtraction. □ 

This modest increase in the formal degree cannot be avoided: without sub- 
traction gates there is no better way to compute the polynomial f(x) = —x 
than by the formula f(x) = — 1 x x, which is of formal degree 2. 

Finally we define the notion of arithmetic branching program. This is an 
edge- weighted directed acyclic graph with two distinguished vertices s and t. 
The output of the branching program is by definition equal to the sum of the 
weights of all paths from s to t, where the weight of a path is the product 
of the weights of its edges. In this paper we assume that the edge weights 
are constants from some field K or variables. Like an arithmetic circuit, a 
branching program therefore represents a polynomial with coefficients in K. 
The depth of a branching program is the length (in number of edges) of 
the longest path from s to t. The term arithmetic (or algebraic) branching 
program goes back at least to |15^ H] but these objects were used implicitly 
much earlier, for instance in |21j. Skew circuits, weakly skew circuits and 
arithmetic branching programs are essentially equivalent models. Indeed, as 
shown in |9] they simulate each other with only linear overhead (see [8] for 
the multilinear case). 

3 From Circuits to Branching Programs 

We first recall Lemma 4 from |12) . 

Lemma 1 Let C be a circuit of size t and formal degree d, containing only 
binary unweighted arithmetic gates. There exists a weakly skew circuit C of 
formal degree d and size at most t log2d which computes the same polynomial. 

The fact that C has same formal degree as C is not explicitly stated in |12j . 
but it can be checked that their construction does satisfy this additional 
property (more on this in the proof of Proposition [2]). We would like to 
apply this construction not to C itself, but to a "normal form" of C containing 
weighted addition gates. We begin with an easy lemma. 

Lemma 2 Let C be a circuit made only of input gates and (ordinary) addi- 
tion or subtraction gates. Each gate of C is equivalent to a weighted addition 
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gate of total weight at most 2 s , where s is the number of arithmetic gates in 
C. 

Proof. By induction on s. The result is true for s < 1 since an input gate can 
be viewed as a unary weighted addition gate of weight 1, and an ordinary 
addition or subtraction gate as a binary weighted addition gate of weight 2. 
For s > 1, consider an addition or subtraction gate which is an output of 
C. By induction hypothesis each of the two inputs of the gate computes 
a function of the form Y^l=i aiXi wnere 

X ]_ . m m m • X ([1X6 the inputs of C and 

Si \ a i\ — 2 s . Therefore the output gate computes a function of the same 
form with total weight at most 2 s . □ 

Lemma 3 Let C be a circuit containing s (weighted) addition gates and m 
multiplication gates. There is an equivalent circuit C+ such that: 

(i) C + contains at most s addition gates and m multiplication gates. 

(ii) Any input to an addition gate is an input of C+ or the output of a 
multiplication gate (in other words, the output of an addition gate can 
be fed only to multiplication gates). 

(Hi) If all the addition gates of C are ordinary additions or subtractions, 
the total weight of every addition gate of C+ is at most 2 s . 

(iv) C + is of same formal degree as C. 

In this lemma and elsewhere in the paper, "equivalent" means that C+ com- 
putes the same polynomial as C. 

Proof of Lemma We will keep the same multiplication gates in C+ as in 
C . Consider a multiplication gate in C having at least one addition gate 7 as 
an input. We can view 7 as the output of a maximal subcircuit which does 
not contain any internal multiplication gate (the inputs to the subcircuit are 
therefore inputs of C or multiplication gates). The output of this subcircuit 
is a linear function of its inputs. We can therefore replace the subcircuit 
by a single (weighted) addition gate 7'. Moreover, in the case where all the 
addition gates of C are ordinary additions or subtractions, 7' can be taken 
of weight at most 2 s by Lemma [2j 

We perform this replacement simultaenously for all addition gates of C 
feeding into a multiplication gate. If the output of C is a multiplication 
gate, we are done. If the output is an addition gate, we likewise replace its 
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maximal subcircuit by a weighted addition gate. The resulting circuit C + 
satisfies properties (i), (ii) and (iii). 

A straightforward induction shows that every multiplication gate a of C 
has same formal degree as the corresponding gate in C + ; and that if a has 
an input 7 which is an addition gate, the formal degree of the corresponding 
gate 7' in C + will be equal to that of 7. Hence property (iv) is satisfied as 
well. □ 

The same transformation as in Lemma [1] can be applied to C+ instead of C. 
The resulting weakly skew circuit contains weighted addition gates. 

Proposition 2 Let C be a circuit of size t and formal degree d where all 
multiplication gates are binary. There exists a weakly skew circuit C of 
degree d and size at most £ lo s M which computes the same polynomial. In 
C , any input to an addition gate is an input of the circuit or the output of 
a multiplication gate. Moreover, if all the addition gates of C are ordinary 
additions or subtractions, the total weight of every addition gate of C is at 
most 2 t . 

Proof. We only give a sketch since this is really the same construction as in 
Lemma 4 of [12] . We briefly explain below why this construction preserves 
properties (ii), (iii) and (iv) from Lemma [3j and refer to [12] for more details. 
To achieve weak skewness C' contains multiple copies of each gate of C + . 
Moreover, the connection pattern of C + is preserved in the following sense. 
If a' is a copy of a multiplication gate a then its two inputs j3' and 7' are 
copies of the two inputs /3 and 7 of a. Likewise, for any addition gate a of 
C + the inputs of a copy a 1 will be copies of its inputs, and moreover a' and 
a will have the same weights (|12j considers only unweighted binary addition 
gates, but the general case is identical). In particular, a and a' have same 
total weight and the inputs to a' are inputs of C or multiplication gates. 
A straightforward induction shows that every gate of C + has same formal 
degree as its copies in C' . □ 

Proposition 3 Let C be a weakly skew circuit of size m and formal degree 
d, with weights of addition gates coming from some set W . Assume more- 
over that any input to an addition gate is an input of C or the output of a 
multiplication gate. There exists an equivalent arithmetic branching program 
G of size at most m + 1 and depth at most 3d — 1 . The edges of G are labeled 
by inputs of C or constants from W. 

Proof. The construction is similar to that of ([12]. Lemma 5). The main new 
point is to check the depth bound. Recall from Section [2] that for every mul- 
tiplication gate a in C we have an independent subcircuit which is connected 
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to the remainder of C only by the arrow from the subcircuit to a. As in |12) 
we say that a gate is reusable if it does not belong to any independent sub- 
circuit. Also as in |12j . we will prove a version of Proposition [3] for circuits 
with multiple outputs. 

We will show by induction that for any reusable gate a of C there is a 
vertex t a in G such that the weight of (s,t a ) is the polynomial computed 
by a. As to the depth, we will show that if a is an addition gate computing 
a polynomial of formal degree d a , the depth of t a in G (the length of the 
longest path from s to t a ) is at most 3d a — 1; if a is a multiplication or input 
gate, its depth is at most 3d a — 2. 

The beginning of the induction is clear: a weakly skew circuit C of size 
m = 1 is reduced to a single gate a labeled by some input x. The corre- 
sponding graph G has two nodes s and t, with an edge from s to t labeled 
by x. We take of course t a = t. We have d a = 1, and this gate is indeed at 
depth 3d a — 2 = 1. 

Consider now a weakly skew circuit C of size m > 2, and let a be one of 
its ouptut gates. Removing a from C, we obtain a circuit C of size m — 1. 
By induction hypothesis, there is a corresponding graph G' of size at most 
m with a distinguished vertex s. 

If a is an input gate labeled by x, we obtain G by adding a vertex t a to 
G' , and an edge from s to i Q labeled by x. 

Assume now that a is a (weighted) addition gate, with k (distinct) inputs 
ai, . • . , a,}.. These k gates must be reusable, so by induction hypothesis we 
have vertices t ai in C" so that the weight of (s, t ai ) is equal to the polynomial 
computed by Oj. Moreover, since a is an addition gate the Oj are multipli- 
cation or input gates, and are therefore at depth at most 3d ai — 2 < 3d a — 2. 
We obtain G by adding a new vertex t a to G' , and k new edges from the 
t ai to t a (labeled by the same weights as the incoming edges of the addi- 
tion gate a). The weight of (s,t a ) in G is clearly equal to the polynomial 
computed by a, and t a is at depth at most (3d a — 2) + 1 = 3d a — 1. 

Assume finally that a is a multiplication gate with inputs /3 and 7. Let 
Cr and be the corresponding subcircuits. Since C is weakly skew, one 
of the two subcircuits (say, C 7 ) is independent from the rest of C. Hence 
m = mp+m^+l where mp and m 7 are the sizes of Cp and C 7 . We can apply 
separately the induction hypothesis to Cp and C 7 . This yields two graphs 
Gp and G 7 of respective sizes at most mp + 1 and m 7 + 1, with sources sp 
and Sj. In these graphs there are vertices ig and i 7 such that the weight 
of (sB,tg) in Cg is equal to the polynomial computed by gate j3, and the 
weight of (s 7 ,t 7 ) in C 7 is equal to the polynomial computed by gate 7. We 
construct G from these two graphs by identifying tp and s 7 . The source of G 
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is s = sp. This graph is of size at most (mp + 1) + (m~ + 1) — 1 = m < m + 1. 
In G, the vertex associated to gate a will be t a = t 7 . The weight of (s, i 7 ) in 
G is indeed equal to the polynomial computed by gate a. For vertices v in 
G 7 the weight of (s,v) in G is noi equal to the weight of (s~,v) in GL, but 
as pointed out in [12J this does not matter since these vertices correspond to 
non-reusable gates of C. 

Let d, dp and cL be the formal degrees of the circuits C, Cp and C 7 . 
By induction hypothesis, i 7 is at depth at most 3d 7 — 1 in GL, and ig is 
at depth at most 3dp — 1 in Gp. In G, i 7 is therefore at depth at most 
(3^ - 1) + (3d 7 - 1) = 3d - 2. □ 

Combining Propositions [2] and [3] yields the following result. 

Theorem 2 Let C be a circuit of size t and formal degree d where all mul- 
tiplication gates are binary. There is an equivalent arithmetic branching 
program G of size at most t lo & 2d + 1 and depth at most 3d — 1. The edges of 
G are labeled by inputs of C or by constants. Moreover, if all the addition 
gates of C are ordinary additions or subtractions then these constants are 
integers of absolute value at most 2 t . 

4 From Branching Programs to Depth-4 Circuits 

In this section we complete the reduction to circuits of depth 4. 

Lemma 4 Let G be an arithmetic branching program of size m and depth 5, 
with edges labeled by elements from some set S. There is an m x m matrix 
M such that the polynomial computed by G is equal to the entry at row 1 
and column m of the matrix power M p , for any integer p > 5. Moreover, 
the entries of M are in the set S U {0, 1}. 

Proof. Fix a topological ordering of the nodes of G, with the source s labeled 
1 and the target t labeled m. We define M as the adjacency matrix of the 
graph G' obtained from G by adding a loop of weight 1 on vertex t. In 
other words, M mm = 1 and in all other cases Mij is the (possibly null) 
weight from node i to node j of G. Note that M is upper-diagonal, with all 
diagonal entries equal to except M mm . It follows from the classical relation 
between matrix powering and paths in graphs that {M p )i m is equal to the 
sum of weights of all si-paths of length exactly p in G' . This is also the sum 
of weights of all si-paths of length at most p in G, and for p > 5 this is the 
output of the arithmetic branching program. □ 
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Note that for p > 5 all entries of M p except {M p )\ m are equal to zero. 

In the last step in our series of reduction, we explain (following basically 
the same strategy as in [I]) how to perform the matrix powering operation 
in the above lemma with depth four formulas, and also depth four circuits. 

Proposition 4 Let G be an arithmetic branching program of size m and 
depth 5. There is an equivalent depth four circuit T with m 2 + 1 unweighted 
addition gates and m^^~^ +1 + m}^^" 1 multiplication gates. There is also an 
equivalent depth four formula T t with m}^^~ 1 + 1 unweighted addition gates 

and mJ"*^ -1 + rn 2 ^^^ 2 multiplication gates. 

The inputs ofT and Tf are from the set as the edge labels of G, and their 
multiplication gates are of fan- in \y/5~\ . 

Proof. We need to compute M p , where p > 5 and M is as in Lemma [U 
Let p be the smallest square integer bigger or equal to 5. From M we will 
compute N = MvP by a depth 2 circuit 1^ , and then from TV we will compute 
M p = using the same circuit. With a depth 2 circuit one cannot play 
clever tricks: we can only expand a polynomial as a sum of monomials. In 
this case we express each entry of N as a sum of m^ 1 " 1 products of length 
y/p, by brute-force expansion of the product Mv^ . This yields a circuit T2 
with m 2 addition gates (one for each entry of N) and m^ +1 multiplication 
gates. We can double those estimates to upper bound the size of V. To arrive 
at the slightly better estimate in the statement of Proposition [U note that 
the second copy of T2 only needs to compute a single entry of N^. 

In order to obtain an arithmetic formula, we recompute from scratch each 
entry of iV whenever it is used by the second copy of T2- The arithmetic 
formula therefore computes a sum of m^ 1 products, where each product is 
a sum of m^P -1 products of entries of M. We therefore have one addition and 
m^" 1 products gates in the top two levels, m^~ l addition and m 2 *^ -1 ) 
multiplication gates in the two bottom levels. □ 

Note the significant saving in the number of addition gates if we use depth 
four circuits instead of depth four formulas. We can now prove our main 
depth reduction result. 

Theorem 3 Let C be an arithmetic circuit of size t and formal degree d 
where all multiplication gates are binary. There is an equivalent depth four 
circuit r with at most (t log2c( + 1) 2 + 1 unweighted addition gates and at most 
2(t log2d + 1)^+2 multiplication gates. 

There is an equivalent arithmetic formula V t of depth four with at most 
^lo g 2d + ^V3d + 1 unwe ighted addition gates and at most 2(t log2d + l) 2 ^ 
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multiplication gates. The inputs ofT and Tf are inputs of C or constants; 
their multiplication gates are of fan-in at most y/3d + 1. 

If C is constant-free, and if all the addition gates of C are ordinary 
additions or subtractions, then these constants are integers of absolute value 
at most 2 t . 

Proof. Combine Theorem [2] and Proposition 0] □ 

Remark 1 We can obtain smaller circuits for C by going for a constant 
depth larger than four. Let M be the matrix in the proof of Proposition [^J 
To compute a power M v we can start from M and raise our matrix to the 
power repeatedly (A times). If we implement each of the A powerings by 
a depth 2 circuit, we obtain for the branching program G a circuit of depth 
2A and size m°( v®, for any constant A > 2. For C, this translates into a 
circuit of depth 2A and size £°( vdlogd) _ 

If we start from arithmetic formulas (or more generally weakly skew circuits) 
instead of general arithmetic circuits, we can obtain depth four formulas and 
circuits of smaller size than in Theorem [3j Indeed, in this case we do not need 
the transformation from arithmetic circuits to weakly skew circuits given by 
Proposition [2j This saves a factor of roughly log 2d in the exponents of our 
complexity bounds. 

Theorem 4 Let C be a weakly skew circuit of size t and formal degree d. 
There is an equivalent depth four circuit T with at most (i+l) 2 + l unweighted 
addition gates and at most 2(t + l^^<^+ 2 multiplication gates. 

There is an equivalent arithmetic formula Tf of depth four with at most 
(t + + 1 unweighted addition gates and at most 2(t + l) 2v/ 3^ multipli- 

cation gates. The inputs of T and Tf are inputs of C or constants; their 
multiplication gates are of fan- in at most y/3d + 1. 

// C is constant- free, and if all the addition gates of C are ordinary 
additions or subtractions, then these constants are integers of absolute value 
at most 2 t . 

Proof. Before applying Proposition [3] we make sure that any input to an 
addition of gate of C is an input of the circuit or a multiplication gate. By 
Lemma [3] this condition can be ensured without increasing the size of C (and 
this transformation preserves weak skewness). Hence there is an equivalent 
arithmetic branching program of size at most t+l and depth at most 3d — 1. 
Then we convert this branching program into a depth 4 circuit or a depth 4 
formula using Proposition 3J 
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When C is constant free, the bound on the absolute value of the constants 
of r and r f comes (as in Theorem [3]) from property (iii) in Lemma O □ 

The savings in the number of addition gates in depth four circuits com- 
pared to depth four formulas are especially significant in the above theorem: 
our circuits contain only quadratically many addition gates. This is a rel- 
evant parameter since the number of addition gates (minus 1) is equal to 
the number of distinct sparse polynomials in a sum of products of sparse 
polynomials [10J. 

5 Depth Reduction for VP 

In accordance with Definition [H a unary weighted addition gate outputs a-x, 
where a is the weight of the gate and x its input. Recall also from the 
definition of formal degree in Section [2] that the formal degree of such a gate 
is equal to that of its input. 

The following result is essentially Lemma 2 from [11], written in a dif- 
ferent language. We give the proof because we will build on it in the next 
section. 

Proposition 5 Any VP family (f n ) can be computed by a polynomial-size 
family (C n ) of circuits of formal degree deg(/ n ). The addition gates of C n 
are unary weighted or binary unweighted (i.e., "ordinary"). 

Proof. Since (/„) is in VP, this family can be computed by a family (C' n ) 
of arithmetic circuits of polynomial size where all the arithmetic gates are 
binary unweighted. To construct C n from C' n we use a small variation on 
the standard homogenization trick. In order to homogenize C' n one would 
normally represent each gate 7 computing a polynomial / 7 by a sequence 7$ 
of d n + 1 gates, where i ranges from to d n and 73 computes the homoge- 
nous component of / 7 of degree i. The homogenous components of degree 
higher than d n can be discarded since they cannot contribute to the final 
output. This construction preserves polynomial circuit size, and each gate 
now computes a polynomial of degree at most d n . But formal degree can be 
higher due to multiplication by constants (i.e., homogenous components of 
degree 0). 

To circumvent this difficulty, we get rid of the gates 70 representing ho- 
mogenous components of degree 0. We will therefore construct a circuit 
which computes the sum of all homogenous components of f n of degree at 
least 1. Our final circuit C n will then add the output of C" to the constant 
term of f n , at the cost of one additional arithmetic operation. 
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We will use unweighted addition gates inside C". Indeed, let 7 be a mul- 
tiplication gate of C n with inputs a and j3. To obtain /~$, the homogenous 
component of degree i, one normally writes / 7j j = Y^j=ofa,jfp,i-j- This 
expression involves f a Q and fg Oj which as we have said are not represented 
by any gate of C". Therefore, to compute e.g. f a fifni, instead of a multi- 
plication gate we use a unary addition gate with input fp t i and weight f a ^. 
A straightforward induction shows that a gate ji in will have formal 
degree i. As a result, C" and C n will be of formal degree d n . □ 

Theorem 5 Let (f n ) be a VP family of polynomials of degree d n = deg(/ n ). 
This family can be computed by a family (T n ) of depth four circuits with 
n O(logd„) addition g a t es and n O{Vdnlogdn) multiplication gates. The family 
(fn) can a l so be computed by a family (F n ) of depth four arithmetic formu- 
las of size n O(Vdn l °g d ™) _ The inputs to T n and F n are variables of f n or 
constants; their multiplication gates are of fan-in at most \/3d n + 1. 

Proof. This is an application of Theorem [3j t is polynomial in n, and by 
Proposition [5] we can take d = d n . □ 

6 Depth Reduction for VP 

We first show that a circuit of small size and degree where all inputs are in 
{ — 1,0, 1} cannot compute a large integer. 

Lemma 5 Let C be a constant-free and variable-free circuit of size t and 
formal degree d where all arithmetic gates are binary unweighted. The output 
of C is an integer of absolute value at most 2 td . 

Proof. By induction on t. For t = 1 the circuit contains a single input gate, 
which must carry an integer in { — 1,0,1}. The result is therefore true for 
t = 1. Consider now a circuit C of size t > 2, and let d\ and c?2 be the formal 
degrees of the two inputs to the output gate. By induction hypothesis these 
two gates carry integers of absolute value at most 2^~ 1 ^ 1 and 2^-^ d ' 2 . If 
the output gate is an addition we have d\,d% ^ d and C therefore computes 
an integer of absolute value at most 2^ 1 ' >d + 2^^ d < 2 td . If the output 
gate is a multiplication, we have d = d\ + d% and C computes an integer of 
absolute value at most 2 ( - t -^ dl x 2 ( - t ~^ d2 < 2 td . □ 

Proposition 6 Any VP family (f n ) can be computed by a family (C n ) of 
constant-free circuits of polynomial size and formal degree deg(/ n ). The 
arithmetic gates of C n are binary multiplication, ordinary addition or sub- 
traction gates. 
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Proof. Since (f n ) is in VP , this family can be computed by a family (C' n ) 
of constant-free circuits of polynomial size and polynomial formal degree. 
All the arithmetic gates of C' n can be assumed to be binary unweighted. To 
construct C n from C' n we proceed along the same lines as in Proposition [5j 
In particular, we will again construct a circuit which computes the sum 
of all homogenous components of f n of degree at least 1. Our final circuit 
C n then adds the output of to the constant term of /„ (call it c n ). By 
Lemma [5j c n has polynomial bit size (it is equal to the output of C' n when 
all variables are set to 0). We can therefore compute \c n \ from scratch using 
a sequence of multiplications by 2 and additions of bits. We use an addition 
to perform a multiplication by 2, so this construction does not require any 
multiplication gate. Finally, depending on the sign of c n we add or subtract 
\c n \ to the output of C". The resulting circuit C n will have same formal 
degree as C". 

We also need to use a similar trick inside C^. Indeed, let 7 be a multi- 
plication gate of C n with inputs a and (3. To obtain f^^, the homogenous 
component of degree i, one normally writes / 7j j = X/}=o f a jfp,i-j- This 
expression involves f a $ and fg t o, which as explained in the proof of Propo- 
sition [5] are not represented by any gate of C". Therefore, to compute e.g. 
fafiffii we start from fg $ and compute the product using a sequence of 
multiplications by 2 and additions of fg j. As explained above, thanks to 
Lemma [5] this can be done with a polynomial number of addition gates, at 
most one subtraction and no multiplication gate. A straightforward induc- 
tion shows that a gate ji in C" will have formal degree i. As a result, C" 
and C n will be of formal degree d n . □ 

By Proposition [TJ one can get rid of the subtraction gates in Proposition [6] 
at the cost of a linear increase in circuit size and an increase in the formal 
degree by just 1 (using Lemma 3 from |llj instead of Proposition [TJ would 
give a worse degree bound). 

Theorem 6 Let (f n ) be a VP family of polynomials of degree d n = deg(/ n ). 
This family can be computed by a family (r n ) of depth four circuits with 
n O(logd n ) addition g a t es and ^(v^logdn) multiplication gates. The family 
(f n ) can also be computed by a family (F n ) of depth four arithmetic formulas 
of size n°'^" lo s rf ™) . The inputs to T n and F n are variables of f n or relative 
integers of polynomial bit size; their multiplication gates are of fan-in at most 



Proof. This is an application of Theorem [3} t is polynomial in n, and by 
Proposition [6] we can take d = d n . □ 




+ 1. 
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7 Application to Boolean Circuits 



In this section we give an application of our results to boolean circuit com- 
plexity. A discussion of depth reduction in the boolean versus arithmetic 
setting can already be found in [1], but that paper did not actually provide 
any result of this type. Here we use arithmetic techiques to reprove a known 
result : languages in LOGCFL have nontrivial constant-depth circuits. 

Proposition 7 Let L be a languange in LOGCFL. For every e > 0, L can 
be decided by a family of constant- depth circuits F n of size 2 nE . The gates of 
T n are OR or AND gates, both of unbounded fan-in, and NOT gates. 

Proof Sketch. It is known that languages in LOGCFL can be recognized by 
families (C n ) of semi-unbounded circuits of logarithmic depth and polyno- 
mial size |25j. Each circuit C n has In inputs; the remaining gates are AND 
gates of fan-in 2 or OR gates of unbounded fan-in. A language L in LOGCFL 
is recognized by the corresponding circuit family in the following sense: a 
word x £ {0, l} n belongs to L iff the input X ^ • • • X^X^ • • • QCfi is accepted by 
C n . 

We view C n as an arithmetic circuit over the boolean semiring 7Z — 
({0, 1}, V, A): the boolean OR is the addition of 1Z, and the boolean AND 
is its multiplication. The semi-unboundedness property together with the 
O(logn) depth bound imply that C n is of polynomially bounded formal 
degree. It follows that we can apply the results of Section [4] (up to now we 
have considered only arithmetic circuits over fields, but the main results and 
their proofs apply to semirings). The existence of a suitable constant-depth 
circuit family (T ra ) therefore follows from Remark [TJ Note that the depth of 
T n depends on the exponent in the polynomial bound for the formal degree 
ofC n . □ 

Remark 2 Instead of working over the semiring 1Z in the above proof, one 
could also work over (N, +, x). To do this replace each OR gate of C n by a + 
gate and each AND gate by a x gate; apply Remark{]\ to the resulting circuit; 
and finally convert back addition gates into OR gates and multiplication gates 
into AND gates. 

One can find in Lemma 8.1 of [2] a proof of Proposition [7] for languages 
in NL (a subset of LOGCFL), and the authors observe that the proof also 
applies to LOGCFL. According to [26j, the result for NL is usually credited 
to Nepomnjascii |14j . Nepomnjascii proved a uniform version of this result 
which in recent years has been used in time-space lower bounds (see |24j for 
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a survey on this topic). The result for languages in L was used in [7] to 
construct certain uniform families of expanders. 

Another depth reduction result due to Valiant shows that boolean circuits 
of linear size and depth O(logn) have depth-3 circuits of size 2°( n / logl ° sn ). 
This result is stated in [22] for monotone circuits. The statement for non- 
monotone circuits (and a proof based on |20[ 122] ) can be found in [26] • All 
these results suggest that lower bounds on the size of circuits of logarith- 
mic depth might be obtained by proving strong enough lower bounds for 
constant-depth circuits (and quite possibly explain why it is difficult to ob- 
tain very strong lower bounds for constant-depth circuits). 

8 Reduction to Polylogarithmic Depth 

It was shown by Valiant, Skyum, Berkowitz and Rackoff [23] that arithmetic 
circuits of polynomially bounded size and degree can be transformed into 
circuits of polylogarithmic depth and polynomial size (the depth can even 
be made logarithmic with addition gates of unbounded fan-in). Since then 
several refinements of this fundamental result have been published, adressing 
in particular the issues of uniformity |13^ [3] or multilinearity [16]. In this 
section we give another proof of reduction to polylogarithmic depth. The 
depth bound that we obtain is worse than |23j by a logarithmic factor. This 
result is therefore not new neither optimal, but nonetheless we feel that it is 
worth presenting here because its proof is quite simple and based on the same 
tools as the remainder of the paper: (weakly) skew circuits and arithmetic 
branching programs. 

Before turning to general arithmetic circuits, we first parallelize arith- 
metic branching programs. 

Proposition 8 Let G be a (multi- output) arithmetic branching program of 
size m and depth 5. There is a multi-output arithmetic circuit C of depth 
2 [log 5] which computes the m polynomials represented by the m nodes of G. 
The circuit contains m 3 [log 5~\ binary multiplication gates and m 2 [log 5~\ ad- 
dition gates of unbounded fan-in. 

Proof. It is again based on matrix powering. We start from the adjacency 
matrix of G, and add the identity matrix (instead of a single 1 on the diagonal 
as in the proof of Lemma [4]) . Let M be the resulting matrix. Assuming 
again that the source node of G is labeled 1, the polynomial represented by 
node j of G is equal to (M p )ij for any power p > 8. We will compute M p 
by repeated squaring. From M we can compute M 2 by a depth 2 circuit 
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with m 3 multiplication gates and m 2 unbounded additions. We repeat this 
process [log 5] times to obtain M p . □ 

Theorem 7 Let C be a circuit of size t and formal degree d where all mul- 
tiplication gates are binary. There is an equivalent circuit C (with binary 
multiplication gates as well) of depth O(logt-logfi) and size 0(t 3 logt-log d) 

Proof. We decompose C in "layers" Cf C, is made of all gates of C of formal 
degree in the interval [2*,2 i+1 [. Here i ranges from to [logdj. Each layer 
forms a (multi-output) arithmetic circuit; for i > 1, the input gates of C% 
actually belong to previous Cj's for various j < i. The crucial observation 
is that these arithemetic circuits are all skew, i.e., for each mutiplication 
gate at least one of the two arguments is an input gate of Cj. Indeed, the 
product of two gates of formal degree at least 2 l is of formal degree at least 
2 %+1 and therefore cannot belong to Cj. But (as pointed out at the end of 
Section [2]) skew circuits and arithmetic branching programs are essentially 
equivalent objects. In particular, by Lemma 5 of |12| a skew circuit (or even 
a weakly skew circuit) of size s can be simulated by an arithmetic branching 
program of size s + 1 (this result of |12j is stated only for circuits with binary 
addition gates, but the proof clearly applies to unbounded fan-in as welQ). 
By Proposition [8] each Cj is therefore equivalent to a circuit of depth 0(log t) 
and size 0(t 3 log t). We multiply these estimates by 1 + [log d\ to obtain the 
final result. □ 
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